-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement top-k optimization #1960
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #1960 +/- ##
==========================================
- Coverage 89.75% 89.48% -0.28%
==========================================
Files 868 881 +13
Lines 32032 32348 +316
==========================================
+ Hits 28750 28946 +196
- Misses 3282 3402 +120
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's fix the coverage for this PR before merging
using vector_select_comparison_func = | ||
std::function<bool(common::ValueVector&, common::ValueVector&, common::SelectionVector&)>; | ||
|
||
struct TopKScanState { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave a TODO for me to move this into mapper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can move the initialization to mapper.
We need have access to the sortedKeyBlock. If the sorting hasn't started, we don't have access to the sortedKeyBlock.
bfc463b
to
a7089cf
Compare
This PR implements the optimization for top-k queries.
Sample top-k queries:
MATCH (comment:Comment) return comment.length,comment.creationDate ORDER BY comment.length, comment.creationDate LIMIT 5
We merge the order by and limit operator together.
Instead of accumulating all tuples, we do local sort and only keep the top-k tuples in each thread.
Performance number:
https://docs.google.com/spreadsheets/d/1K53Yz8KMuvrFfXbyoPyULhWt9nSBEUI6W4WdMWirODM/edit#gid=1649459535